Word extraction using irregular pyramid

نویسندگان

  • Poh Kok Loo
  • Chew Lim Tan
چکیده

This paper proposed a new algorithm to perform text extraction from imaged documents. The paper focused in the extraction of word group. Irregular pyramid structure is used as the basis of the algorithm. The uniqueness of this algorithm is its inclusion of strategic background information in the analysis where most techniques have discarded. Both foreground (i.e. text area) and portion of background (i.e. white area) regions are examined. The fundamental of the algorithm is based on the concept of “closeness” where text information within a group is closed to each other, in terms of spatial distance, as compared to other text area. The result produced by the algorithm is encouraging with the ability to correctly group words of different size, font, arrangement and orientation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word and Sentence Extraction Using Irregular Pyramid

This paper presents the result of our continued work on a further enhancement to our previous proposed algorithm. Moving beyond the extraction of word groups and based on the same irregular pyramid structure the new proposed algorithm groups the extracted words into sentences. The uniqueness of the algorithm is in its ability to process text of a wide variation in terms of size, font, orientati...

متن کامل

Detection of Word Groups Based on Irregular Pyramid

This paper proposes a new algorithm to detect word groups in imaged documents, using irregular pyramid. The uniqueness of this algorithm is its inclusion of strategic background information in the analysis where most techniques have discarded. Both foreground (i.e. text area) and portion of background (i.e. white area) regions are examined. The fundamental of the algorithm is based on the conce...

متن کامل

Adaptive Region Growing Color Segmentation for Text Using Irregular Pyramid

This paper presents the result of an adaptive region growing segmentation technique for color document images using an irregular pyramid structure. The emphasis is in the segmentation of textual components for subsequence extraction in document analysis. The segmentation is done in the RGB color space. A simple color distance measurement and a category of color thresholds are derived. The propo...

متن کامل

Using Irregular Pyramid for Text Segmentation and Binarization of Gray Scale Image

Compared to binary images that most text extraction methods work on, gray scale images provides much more information for the extraction task. On the other hand complication also arises in determining the subject textual content from its background region (ie. thresholding) before the actual text extraction process can begin. Differing from the usual sequence of processes where document images ...

متن کامل

Using Irregular Pyramid for Text Segmentation and Binarization of Gray Scale Images

Compared to binary images that most text extraction methods work on, gray scale images provide much more information for the extraction task. On the other hand complication also arises in determining the subject textual content from its background region (ie. thresholding) before the actual text extraction process can begin. Differing from the usual sequence of processes where document images a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001